AITopics | content moderation

Collaborating Authors

content moderation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

'It's Undignified': Hundreds of Workers Training Meta's AI Could Be Laid Off

WIREDApr-28-2026, 18:36:47 GMT

'It's Undignified': Hundreds of Workers Training Meta's AI Could Be Laid Off More than 700 people working for a Meta contractor in Ireland are at risk of losing their jobs, documents show. Hundreds of workers in Ireland tasked with refining Meta's AI models have been told that their jobs are at risk as the company embarks on a sweeping new round of layoffs, according to documents obtained by WIRED. The affected workers are employed by the Dublin-based firm Covalen, which handles various content moderation and labeling services for Meta. The workers were informed of the layoffs over a brief video meeting on Monday afternoon and were not allowed to ask questions, according to Nick Bennett, one of the employees on the call. "We had a pretty bad feeling [before the meeting]," he says.

artificial intelligence, press release, wired, (13 more...)

WIRED

Country:

North America > United States (0.96)
Asia > Middle East > UAE (0.29)

Genre: Press Release (0.31)

Industry:

Law (1.00)
Government (0.98)
Information Technology (0.71)

Technology:

Information Technology > Artificial Intelligence (1.00)
Information Technology > Communications > Social Media (0.37)

Add feedback

CulturePark: Boosting Cross-cultural Understanding in Large Language Models

Neural Information Processing SystemsMar-21-2026, 05:04:37 GMT

Cultural bias is pervasive in many large language models (LLMs), largely due to the deficiency of data representative of different cultures.Typically, cultural datasets and benchmarks are constructed either by extracting subsets of existing datasets or by aggregating from platforms such as Wikipedia and social media.However, these approaches are highly dependent on real-world data and human annotations, making them costly and difficult to scale.Inspired by cognitive theories on social communication, this paper introduces CulturePark, an LLM-powered multi-agent communication framework for cultural data collection.CulturePark simulates cross-cultural human communication with LLM-based agents playing roles in different cultures.It generates high-quality cross-cultural dialogues encapsulating human beliefs, norms, and customs.Using CulturePark, we generated 41,000 cultural samples to fine-tune eight culture-specific LLMs.We evaluated these models across three downstream tasks: content moderation, cultural alignment, and cultural education.Results show that for content moderation, our GPT-3.5-based

large language model, machine learning, natural language, (12 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

WIRED Roundup: DOGE Isn't Dead, Facebook Dating Is Real, and Amazon's AI Ambitions

WIREDDec-5-2025, 22:29:11 GMT

WIRED Roundup: DOGE Isn't Dead, Facebook Dating Is Real, and Amazon's AI Ambitions In this episode of, we bring you the news of the week, then dive into how some DOGE operatives are still at work in the federal government--despite reports claiming otherwise. Uncanny Valley host Zoë Schiffer is joined by senior editor Leah Feiger to discuss five stories you need to know about this week, from how Amazon is trying to catch up in the AI race to why Facebook Dating is more popular than ever. Then, they dive into how--despite recent reports claiming that it's over--DOGE operatives are still very much working across federal agencies. Who the Hell Is Actually Using Facebook Dating? Sex Workers Built an'Anti-OnlyFans' to Take Control of Their Profits Here's What Its Operatives Are Doing Now Write to us at uncannyvalley@wired.com . You can always listen to this week's podcast through the audio player on this page, but if you want to subscribe for free to get every episode, here's how: If you're on an iPhone or iPad, open the app called Podcasts, or just tap this link . Today on the show, we're bringing you five stories that you need to know about this week, including how despite some reports claiming that the so-called Department of Government Efficiency is pretty much over, DOGE people are actually still at work across federal agencies. I'm joined today by our senior politics editor, Leah Feiger. How are you doing today? I am great because I've spent the day with you, but our gentle listeners don't know that. So the first story this week is one that I saw and I thought, you know what? Leah's going to want to talk about Amazon's artificial intelligence prowess.

artificial intelligence, large language model, natural language, (18 more...)

WIRED

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > Slovakia (0.04)
Europe > Czechia (0.04)

Genre: Personal > Interview (0.46)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.93)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.69)
Information Technology > Services (0.68)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.47)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.47)

Add feedback

When Harmful Content Gets Camouflaged: Unveiling Perception Failure of LVLMs with CamHarmTI

Li, Yanhui, Zhou, Qi, Xu, Zhihong, Guo, Huizhong, Wang, Wenhai, Wang, Dongxia

arXiv.org Artificial IntelligenceDec-4-2025

Large vision-language models (LVLMs) are increasingly used for tasks where detecting multimodal harmful content is crucial, such as online content moderation. However, real-world harmful content is often camouflaged, relying on nuanced text-image interplay, such as memes or images with embedded malicious text, to evade detection. This raises a key question: \textbf{can LVLMs perceive such camouflaged harmful content as sensitively as humans do?} In this paper, we introduce CamHarmTI, a benchmark for evaluating LVLM ability to perceive and interpret camouflaged harmful content within text-image compositions. CamHarmTI consists of over 4,500 samples across three types of image-text posts. Experiments on 100 human users and 12 mainstream LVLMs reveal a clear perceptual gap: humans easily recognize such content (e.g., over 95.75\% accuracy), whereas current LVLMs often fail (e.g., ChatGPT-4o achieves only 2.10\% accuracy). Moreover, fine-tuning experiments demonstrate that \bench serves as an effective resource for improving model perception, increasing accuracy by 55.94\% for Qwen2.5VL-7B. Attention analysis and layer-wise probing further reveal that fine-tuning enhances sensitivity primarily in the early layers of the vision encoder, promoting a more integrated scene understanding. These findings highlight the inherent perceptual limitations in LVLMs and offer insight into more human-aligned visual reasoning systems.

arxiv preprint arxiv, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2512.03087

Country: Asia > China (0.15)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.87)

Add feedback

AWS CEO Matt Garman Wants to Reassert Amazon's Cloud Dominance in the AI Era

WIREDDec-2-2025, 16:00:00 GMT

As Google and Microsoft continue to surge, the AWS chief lays out his pitch: cheaper, reliable AI delivered at hyperscale. You might think Amazon's biggest swing in the AI race was its $8 billion investment in Anthropic. But AWS has also been building in-house foundation models, new chips, massive data centers, and agents meant to keep enterprise customers locked inside its ecosystem. The company believes these offerings will give it an edge as businesses of all shapes and sizes deploy AI in the real world. WIRED sat down with AWS CEO Matt Garman ahead of the company's annual re:Invent conference in Las Vegas to discuss his AI vision, and how he plans to extend Amazon's lead in the cloud market over its fast-rising competitors, Microsoft and Google.

amazon, artificial intelligence, cloud computing, (17 more...)

WIRED

Country:

North America > United States > Nevada > Clark County > Las Vegas (0.25)
North America > United States > California (0.15)
Asia > Nepal (0.15)
(2 more...)

Industry:

Information Technology > Services (1.00)
Retail > Online (0.78)

Technology:

Information Technology > Cloud Computing (1.00)
Information Technology > Communications > Mobile (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.41)

Add feedback

Ideology-Based LLMs for Content Moderation

Civelli, Stefano, Bernardelle, Pietro, Pratama, Nardiena A., Demartini, Gianluca

arXiv.org Artificial IntelligenceOct-31-2025

Large language models (LLMs) are increasingly used in content moderation systems, where ensuring fairness and neutrality is essential. In this study, we examine how persona adoption influences the consistency and fairness of harmful content classification across different LLM architectures, model sizes, and content modalities (language vs. vision). At first glance, headline performance metrics suggest that personas have little impact on overall classification accuracy. However, a closer analysis reveals important behavioral shifts. Personas with different ideological leanings display distinct propensities to label content as harmful, showing that the lens through which a model "views" input can subtly shape its judgments. Further agreement analyses highlight that models, particularly larger ones, tend to align more closely with personas from the same political ideology, strengthening within-ideology consistency while widening divergence across ideological groups. To show this effect more directly, we conducted an additional study on a politically targeted task, which confirmed that personas not only behave more coherently within their own ideology but also exhibit a tendency to defend their perspective while downplaying harmfulness in opposing views. Together, these findings highlight how persona conditioning can introduce subtle ideological biases into LLM outputs, raising concerns about the use of AI systems that may reinforce partisan perspectives under the guise of neutrality.

large language model, machine learning, persona, (17 more...)

arXiv.org Artificial Intelligence

2510.25805

Country:

Oceania > Australia (0.29)
Asia > Middle East > UAE (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Government (0.67)
Education (0.67)
Telecommunications (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

MoMoE: Mixture of Moderation Experts Framework for AI-Assisted Online Governance

Goyal, Agam, Zhan, Xianyang, Chen, Yilun, Saha, Koustuv, Chandrasekharan, Eshwar

arXiv.org Artificial IntelligenceOct-24-2025

Large language models (LLMs) have shown great potential in flagging harmful content in online communities. Yet, existing approaches for moderation require a separate model for every community and are opaque in their decision-making, limiting real-world adoption. We introduce Mixture of Moderation Experts (MoMoE), a modular, cross-community framework that adds post-hoc explanations to scalable content moderation. MoMoE orchestrates four operators -- Allocate, Predict, Aggregate, Explain -- and is instantiated as seven community-specialized experts (MoMoE-Community) and five norm-violation experts (MoMoE-NormVio). On 30 unseen subreddits, the best variants obtain Micro-F1 scores of 0.72 and 0.67, respectively, matching or surpassing strong fine-tuned baselines while consistently producing concise and reliable explanations. Although community-specialized experts deliver the highest peak accuracy, norm-violation experts provide steadier performance across domains. These findings show that MoMoE yields scalable, transparent moderation without needing per-community fine-tuning. More broadly, they suggest that lightweight, explainable expert ensembles can guide future NLP and HCI research on trustworthy human-AI governance of online communities.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2505.14483

Country: North America > United States (1.00)

Genre: Research Report > New Finding (1.00)

Industry:

Law (0.68)
Health & Medicine (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Add feedback

Quantifying Feature Importance for Online Content Moderation

Tessa, Benedetta, Moreo, Alejandro, Cresci, Stefano, Fagni, Tiziano, Sebastiani, Fabrizio

arXiv.org Artificial IntelligenceOct-24-2025

Accurately estimating how users respond to moderation interventions is paramount for developing effective and user-centred moderation strategies. However, this requires a clear understanding of which user characteristics are associated with different behavioural responses, which is the goal of this work. We investigate the informativeness of 753 socio-behavioural, linguistic, relational, and psychological features, in predicting the behavioural changes of 16.8K users affected by a major moderation intervention on Reddit. To reach this goal, we frame the problem in terms of "quantification", a task well-suited to estimating shifts in aggregate user behaviour. We then apply a greedy feature selection strategy with the double goal of (i) identifying the features that are most predictive of changes in user activity, toxicity, and participation diversity, and (ii) estimating their importance. Our results allow identifying a small set of features that are consistently informative across all tasks, and determining that many others are either task-specific or of limited utility altogether. We also find that predictive performance varies according to the task, with changes in activity and toxicity being easier to estimate than changes in diversity. Overall, our results pave the way for the development of accurate systems that predict user reactions to moderation interventions. Furthermore, our findings highlight the complexity of post-moderation user behaviour, and indicate that effective moderation should be tailored not only to user traits but also to the specific objective of the intervention.

data mining, intervention, machine learning, (25 more...)

arXiv.org Artificial Intelligence

2510.19882

Country:

Europe (0.93)
North America > United States > Texas (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine (0.93)
Information Technology > Security & Privacy (0.67)
Media > News (0.67)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(5 more...)

Add feedback

Asking For It: Question-Answering for Predicting Rule Infractions in Online Content Moderation

Samory, Mattia, Pamfile, Diana, To, Andrew, Phadke, Shruti

arXiv.org Artificial IntelligenceOct-9-2025

Online communities rely on a mix of platform policies and community-authored rules to define acceptable behavior and maintain order. However, these rules vary widely across communities, evolve over time, and are enforced inconsistently, posing challenges for transparency, governance, and automation. In this paper, we model the relationship between rules and their enforcement at scale, introducing ModQ, a novel question-answering framework for rule-sensitive content moderation. Unlike prior classification or generation-based approaches, ModQ conditions on the full set of community rules at inference time and identifies which rule best applies to a given comment. We implement two model variants - extractive and multiple-choice QA - and train them on large-scale datasets from Reddit and Lemmy, the latter of which we construct from publicly available moderation logs and rule descriptions. Both models outperform state-of-the-art baselines in identifying moderation-relevant rule violations, while remaining lightweight and interpretable. Notably, ModQ models generalize effectively to unseen communities and rules, supporting low-resource moderation settings and dynamic governance environments.

category, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2510.0635

Genre: Research Report > New Finding (0.68)

Industry:

Information Technology (1.00)
Law (0.68)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)

Add feedback

SMARTER: A Data-efficient Framework to Improve Toxicity Detection with Explanation via Self-augmenting Large Language Models

Nghiem, Huy, Sachdeva, Advik, Daumé, Hal III

arXiv.org Artificial IntelligenceOct-9-2025

WARNING: This paper contains examples of offensive materials. To address the proliferation of toxic content on social media, we introduce SMARTER, we introduce SMARTER, a data-efficient two-stage framework for explainable content moderation using Large Language Models (LLMs). In Stage 1, we leverage LLMs' own outputs to generate synthetic explanations for both correct and incorrect labels, enabling alignment via preference optimization with minimal human supervision. In Stage 2, we refine explanation quality through cross-model training, allowing weaker models to align stylistically and semantically with stronger ones. Experiments on three benchmark tasks -- HateXplain, Latent Hate, and Implicit Hate -- demonstrate that SMARTER enables LLMs to achieve up to a 13.5% macro-F1 improvement over standard few-shot baselines while using only a fraction of the full training data. Our framework offers a scalable strategy for low-resource settings by harnessing LLMs' self-improving capabilities for both classification and explanation.

explanation, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2509.15174

Country: North America > United States (0.67)

Genre: Research Report > New Finding (0.67)

Industry:

Law > Civil Rights & Constitutional Law (0.68)
Law Enforcement & Public Safety > Terrorism (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback